Using Semantic Graphs and Word Sense Disambiguation Techniques to Improve Text Summarization
نویسندگان
چکیده
This paper presents a semantic graph-based method for extractive summarization. The summarizer uses WordNet concepts and relations to produce a semantic graph that represents the document, and a degree-based clustering algorithm is used to discover different themes or topics within the text. The selection of sentences for the summary is based on the presence in them of the most representative concepts for each topic. The method has proven to be an efficient approach to the identification of salient concepts and topics in free text. In a test on the DUC data for single document summarization, our system achieves significantly better results than previous approaches based on terms and mere syntactic information. Besides, the system can be easily ported to other domains, as it only requires modifying the knowledge base and the method for concept annotation. In addition, we address the problem of word ambiguity in semantic approaches to automatic summarization.
منابع مشابه
Resolving ambiguity in biomedical text to improve summarization
Access to the vast body of research literature that is now available on biomedicine and related fields can be improved with automatic summarization. This paper describes a summarization system for the biomedical domain that represents documents as graphs formed from concepts and relations in the UMLS Metathesaurus. This system has to deal with the ambiguities that occur in biomedical documents....
متن کاملA new graph based text segmentation using Wikipedia for automatic text summarization
The technology of automatic document summarization is maturing and may provide a solution to the information overload problem. Nowadays, document summarization plays an important role in information retrieval. With a large volume of documents, presenting the user with a summary of each document greatly facilitates the task of finding the desired documents. Document summarization is a process of...
متن کاملImproving Summarization of Biomedical Documents Using Word Sense Disambiguation
We describe a concept-based summarization system for biomedical documents and show that its performance can be improved using Word Sense Disambiguation. The system represents the documents as graphs formed from concepts and relations from the UMLS. A degree-based clustering algorithm is applied to these graphs to discover different themes or topics within the document. To create the graphs, the...
متن کاملMultilingual Natural Language Generation within Abstractive Summarization
With the tremendous amount of textual data available in the Internet, techniques for abstractive text summarization become increasingly appreciated. In this paper, we present work in progress that tackles the problem of multilingual text summarization using semantic representations. Our system is based on abstract linguistic structures obtained from an analysis pipeline of disambiguation, synta...
متن کاملSemantic Methods for Textual Entailment
The problem of recognizing textual entailment (RTE) has been recently addressed using syntactic and lexical models with some success. Here, a new approach is taken to apply world knowledge in much the same way as humans, but captured in large semantic graphs such as WordNet. We show that semantic graphs made of synsets and selected relationships between them enable fairly simple methods that pr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 47 شماره
صفحات -
تاریخ انتشار 2011